Search CORE

27 research outputs found

Biomedical ontology MeSH improves document clustering qualify on MEDLINE articles: A comparison study

Author: Hu Xiaohua
Yoo Illhoi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

19th IEEE International Symposium on Computer-Based Medical Systems, CBMS 2006, Salt Lake City, UTDocument clustering has been used for better document retrieval, document browsing, and text mining. In this paper, we investigate if biomedical ontology MeSH improves the clustering quality for MEDLINE articles. For this investigation, we perform a comprehensive comparison study of various document clustering approaches such as hierarchical clustering methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and Suffix Tree Clustering (STC) in terms of efficiency, effectiveness, and scalability. According to our experiment results, biomedical ontology MeSH significantly enhances clustering quality on biomedical documents. In addition, our results show that decent document clustering approaches, such as Bisecting K-means, K-means and STC, gains some benefit from MeSH ontology while hierarchical algorithms showing the poorest clustering quality do not reap the benefit of MeSH ontology

Crossref

Drexel Libraries E-Repository and Archives

A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Author: Hu Xiaohua
Yoo Illhoi
Publication venue
Publication date: 29/07/2006
Field of study

Presented at the 2006 ACM/IEEE Joint Conference on Digital Library (JCDL 2006), June 11-15, 2006, Chapel Hill, NC, USA. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Conference-papers/JCDL06.pdf.Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and Suffix Tree Clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering qualify for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/ hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and Entropy on very large article sets from MEDLINE, the largest biomedical digital library in biomedicine

Drexel Libraries E-Repository and Archives

A coherent biomedical literature clustering and summarization approach through ontology-enriched graphical representations

Author: Hu Xiaohua
Song Il-Yeol
Yoo Illhoi
Publication venue
Publication date: 22/02/2007
Field of study

Data Warehousing and Knowledge Discovery, Proceedings 4081, pp. 374-383, DOI: http://dx.doi.org/10.1007/11823728In this paper, we introduce a coherent biomedical literature clustering and summarization approach that employs a graphical representation method for text using a biomedical ontology. The key of the approach is to construct document cluster models as semantic chunks capturing the core semantic relationships in the ontology-enriched scale-free graphical representation of documents. These document cluster models are used for both document clustering and text summarization by constructing Text Semantic Interaction Network (TSIN). Our extensive experimental results indicate our approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. The primary contribution of this paper is we introduce a coherent biomedical literature clustering and summarization approach that takes advantage of ontologyenriched graphical representations. Our approach significantly improves the quality of document clusters and understandability of documents through summaries

Drexel Libraries E-Repository and Archives

A semantic approach for mining hidden links from complementary and non-interactive biomedical literature

Author: Hu Xiaohua
Yoo Illhoi
Zhang Xiaodan
Zhang Yanqing
Publication venue
Publication date: 29/07/2006
Field of study

Presented at the 2006 SIAM Conference on Data Mining (SIAM DM 2006). Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Conference-papers/SIAM06-Hu.pdf.Two complementary and non-interactive literature sets of articles, when they are considered together, can reveal useful information of scientific interest not apparent in either of the two sets alone. Swanson called the existence of such hidden links as undiscovered public knowledge (UPK). The novel connection between Raynaud disease and fish oils was uncovered from complementary and non-interactive biomedical literature by Swanson in 1986. Since then, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections. This paper proposes a semantic-based mining model for undiscovered public knowledge using the biomedical literature. Our method replaces manual ad-hoc pruning by using semantic knowledge from the biomedical ontologies. Using the semantic types and semantic relationships of the biomedical concepts, our prototype system can identify the relevant concepts collected from Medline and generate the novel hypothesis between these concepts. The system successfully replicates Swanson’s two famous discoveries: Raynaud disease/fish oils and migraine/magnesium. Compared with previous approaches such as LSI-based and traditional association rule-based methods, our method generates much fewer but more relevant novel hypotheses, and requires much less human intervention in the discovery procedure

Drexel Libraries E-Repository and Archives

Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule

Author: Hu Xiaohua
Li Guangren
Wu Daniel
Xu Xuheng George
Yoo Illhoi
Zhang Xiaodan
Zhou Xiaohua
Publication venue
Publication date: 29/07/2006
Field of study

Paper accepted for publication in Journal of Information Systems. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Journal-papers/JIS_hu2006.pdf.The novel connection between Raynaud dise ase and fish oils was uncovered from two disjointed biomedical literature sets by Swanson in 1986. Since then, there have been many approaches to uncover novel connections by mining the biomedical literature. One of the popular approaches is to adapt the Association Rule (AR) method to automatically identify implicit novel connections between concept A and concept C from two disjointed sets of documents through intermediate B concept. Since A and C concepts do not occur together in the same data set , the mining goal is to find novel connection among A and C concepts in the disjoint data sets. It first applies association rul e to the two disjointed biomedical literature sets separately to generate two rule sets (AàB, BàC), and then applies transitive law to get the novel connection s AàC. However, this approach generates a huge number of possible connections among the millions of biomedical concepts and a lot of these hypothetical connections are spurious, useless and/or biologically meaningless. Thus it is essential to develop new approach to generate highly likely novel and biologically relevant connections among the biomedical concepts. This paper presents a Biomedical Semantic-based Association Rule System (Bio - SARS) that significantly reduce spurious/useless/biologically irrelevant connections through semantic filtering. Compared to other approaches such as LSI and traditional association rule-based approach, our approach generates much fewer rules and a lot of these rules represent relevant connections among biological concepts

Drexel Libraries E-Repository and Archives

A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method

Author: A Wu
A Wu
AL Barabasi
F Beil
G Erkan
Il-Yeol Song
Illhoi Yoo
J Ghosh
J Kleinberg
LAN Amaral
M Steinbach
MA Hearst
MEJ Newman
MEJ Newman
P Erdos
P Pantel
R Ferrer-Cancho
R Rada
RA Hanneman
S Salton
T Nomato
Xiaohua Hu
Y Zeng
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

Author: B Suomela
C Burges
C Sneiderman
D States
F Radlinski
G Poulter
G Salton
H Oh
H Yu
H Yu
Hwanjo Yu
Ilhwan Ko
J Xu
Jinoh Oh
L Murphy
M Siadaty
Sungchul Kim
T Joachims
T Joachims
T Qin
Taehoon Kim
V Cherkassky
W Hersh
Wook-Shin Han
X Geng
Y Cao
Y Lin
Yoo Illhoi
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.1114Nsciescopu

Crossref

Springer - Publisher Connector

PubMed Central

포항공과대학교

A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

Author: A Broder
A Chang
A Hoogendam
A Névéol
A Sood
A Sood
Abu Saleh Mohammad Mosa
AM Smith
B Thirion
BJ Jansen
C Nankivell
C Silverstein
D Mattox
E Bernstam
E Giglia
E Motschall
E-M Lacroix
Evidence-Based Medicine Working Group
F Facca
G Murray
I Yoo
I Yoo
Illhoi Yoo
J Han
J Han
JM Sáez Gómez
JO Ebbert
JR Herskovic
JW Ely
K Bahaadinbeigy
LV Gault
M Dimitrijević
M Hall
M Muin
M Shultz
ME Anders
NC Baker
PC Wong
PE Gallagher
R Agrawal
R Agrawal
R Haux
RB Haynes
RI Doğan
RI Doğan
T-J Chen
V Sriganesh
Y-M Tai
Z Lu
Z Lu
Z Lu
Z Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

ACKNOWLEDGEMENTS

Author: Illhoi Yoo
Publication venue
Publication date
Field of study

I am indebted to many people for their support and advice to the successful completion of my Ph.D degree and this dissertation. My deepest gratitude goes to my supervisor, Dr. Xiaohua Hu, for his guidance and assistance with this dissertation as well as all the research during my doctoral research endeavor for the past four years. He has helped me to move forward with investigation in-depth and to remain focused on achieving my goal. I am grateful to my committee members, Dr. Il-Yeol Song, Dr. Xia Lin, Dr. Bahrad A. Sokhansanj, and Dr. Don Goelman, for their invaluable advice and suggestions. Especially, Dr. Song has always been meticulous in proofreading my research papers. His advice on both academic and non-academic matters has been inestimable. I would like to express my appreciation to my parents, SungTae Yoo and SunJa Park, and to my parents-in-law, TaeWhan Jung and SoonAe Goo, for their love, support and encouragement. I would like to express my sincere thanks to my wife YoungJae Jung for her love and sacrifice. Without her constant sacrifice, this thesis would not have bee

CiteSeerX